Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data
نویسندگان
چکیده
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios--family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.
منابع مشابه
Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data
The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we presen...
متن کاملSmarter clustering methods for SNP genotype calling
MOTIVATION Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to 'call' the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is i...
متن کاملReliability of algorithmic somatic copy number alteration detection from targeted capture data
Motivation Whole exome and gene panel sequencing are increasingly used for oncological diagnostics. To investigate the accuracy of SCNA detection algorithms on simulated and clinical tumor samples, the precision and sensitivity of four SCNA callers were measured using 50 simulated whole exome and 50 simulated targeted gene panel datasets, and using 119 TCGA tumor samples for which SNP array dat...
متن کاملGenome analysis CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data
Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants....
متن کاملSequenza: allele-specific copy number and mutation profiles from tumor sequencing data
BACKGROUND Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determinatio...
متن کامل